Range Concatenation Grammars for Translation
نویسنده
چکیده
Positive and bottom-up non-erasing binary range concatenation grammars (Boullier, 1998) with at most binary predicates ((2,2)-BRCGs) is a O(|G|n6) time strict extension of inversion transduction grammars (Wu, 1997) (ITGs). It is shown that (2,2)-BRCGs induce inside-out alignments (Wu, 1997) and cross-serial discontinuous translation units (CDTUs); both phenomena can be shown to occur frequently in many hand-aligned parallel corpora. A CYK-style parsing algorithm is introduced, and induction from aligment structures is briefly discussed. Range concatenation grammars (RCG) (Boullier, 1998) mainly attracted attention in the formal language community, since they recognize exactly the polynomial time recognizable languages, but recently they have been argued to be useful for data-driven parsing too (Maier and Søgaard, 2008). Bertsch and Nederhof (2001) present the only work to our knowledge on using RCGs for translation. Both Bertsch and Nederhof (2001) and Maier and Søgaard (2008), however, only make use of so-called simple RCGs, known to be equivalent to linear context-free rewrite systems (LCFRSs) (Weir, 1988; Boullier, 1998). Our strict extension of ITGs, on the other hand, makes use of the ability to copy substrings in RCG derivations; one of the things that makes RCGs strictly more expressive than LCFRSs. Copying enables us to recognize the intersection of any two translations that we can recognize and induce the union c © 2008. Licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported license (http://creativecommons.org/licenses/by-nc-sa/3.0/). Some rights reserved. of any two alignment structures that we can induce. Our extension of ITGs in fact introduces two things: (i) A clause may introduce any number of terminals. This enables us to induce multiword translation units. (ii) A clause may copy a substring, i.e. a clause can associate two or more nonterminals A1, . . . An with the same substring and thereby check if the substring is in the intersection of the languages of the subgrammars with start predicate names A1, . . . An. The first point is motivated by studies such as Zens and Ney (2003) and simply reflects that in order to induce multiword translation units in this kind of synchronous grammars, it is useful to be able to introduce multiple terminals simultaneously. The second point gives us a handle on context-sensitivity. It means that (2,2)-BRCGs can define translations such as {〈anbmcndm, anbmdmcn〉 | m,n ≥ 0}, i.e. a translation of cross-serial dependencies into nested ones; but it also means that (2,2)-BRCGs induce a larger class of alignment structures. In fact the set of alignment structures that can be induced is closed under union, i.e. any alignment structure can be induced. The last point is of practical interest. It is shown below that phenomena such as inside-out alignments and CDTUs, which cannot be induced by ITGs, but by (2,2)-BRCGs, occur frequently in many hand-aligned parallel corpora. 1 (2,2)-BRCGs and ITGs (2,2)-BRCGs are positive RCGs (Boullier, 1998) with binary start predicate names, i.e. ρ(S) = 2. In RCG, predicates can be negated (for complementation), and the start predicate name is typically unary. The definition is changed only for aesthetic reasons; a positive RCG with a binary start predicate name S is turned into a positive RCG with a
منابع مشابه
From Contextual Grammars to Range Concatenation Grammars
Though the field of natural language processing is one of the major aims that has led to the definition of contextual grammars, very little was made on that subject. One reason is certainly the lack of efficient parsers for contextual languages. In this paper we show how some subclasses of contextual grammars can be translated into equivalent range concatenation grammars and can thus be parsed ...
متن کاملOn the Complexity of Alignment Problems in Two Synchronous Grammar Formalisms
The alignment problem for synchronous grammars in its unrestricted form, i.e. whether for a grammar and a string pair the grammar induces an alignment of the two strings, reduces to the universal recognition problem, but restrictions may be imposed on the alignment sought, e.g. alignments may be 1 : 1, island-free or sure-possible sorted. The complexities of 15 restricted alignment problems in ...
متن کاملAn Earley Parsing Algorithm for Range Concatenation Grammars
We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation help...
متن کاملParsing Directed Acyclic Graphs with Range Concatenation Grammars
Range Concatenation Grammars (RCGs) are a syntactic formalism which possesses many attractive properties. It is more powerful than Linear Context-Free Rewriting Systems, though this power is not reached to the detriment of efficiency since its sentences can always be parsed in polynomial time. If the input, instead of a string, is a Directed Acyclic Graph (DAG), only simple RCGs can still be pa...
متن کاملTuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering
In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component TreeAdjoining Gramma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008